Chapter 22
Comparing Survival Times
IN THIS CHAPTER
Using the log-rank test to compare two groups
Thinking about more complicated ways to compare the survival experience
Calculating the necessary sample size to compare survival times
The life table and Kaplan-Meier survival curves described in Chapter 21 are ideal for summarizing
and describing the time to the first or only occurrence of a particular event based on times observed in
a sample of individuals. They correctly incorporate data that reflect when an individual is observed
during the study but does not experience the event, which is called censored data. Animal and human
studies involving endpoints that occur on a short time-scale, like measurements taking during an
experimental surgical procedure, may yield totally uncensored data. However, the more common
situation is that during the observation period of studies, not all individuals experience the event, so
you usually have censored data on your hands.
In biological research and especially in clinical trials (discussed in Chapter 5), you often want to
compare survival times between two or more groups of individuals. In humans, this may have to do
with survival after cancer surgery. In animals, it may have to do with testing the toxicity of a potential
therapeutic. This chapter describes an important method for comparing survival curves between two
groups called the log-rank test, and explains how to calculate the sample size you need to have
sufficient statistical power for this test (see Chapter 3). The log-rank test can be extended to handle
three or more groups, but this discussion is beyond the scope of this book.
In this chapter, as in Chapters 21 and 23, we use the term survival in reference to the outcome
of death. However, all the calculations pertain to any type of outcome event being studied,
including good ones, such as cancer going into remission.
There is some ambiguity associated with the name log-rank test. It has also been called
different names (such as the Mantel-Cox test), and has been extended into variants such as the
Gehan-Breslow test. You may also observe that different software may calculates the log-rank
test slightly differently. In this chapter, we describe the most commonly used form of the log-rank
test.
If have no censored observations in your data, you can skip most of this chapter. This may